Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement getBlobs and bypass blob gossip validation on successful blob retrievals from EL #6913

Open
wants to merge 15 commits into
base: unstable
Choose a base branch
from

Conversation

agnxsh
Copy link
Contributor

@agnxsh agnxsh commented Feb 10, 2025

ethereum/execution-apis#559

if EL supports engine_getBlobsV1

  • makes the block available as soon as the CL gets all the required blobs from the blob pool
  • populates blob quarantine with that info (should probably verify proofs and inclusion once, like gossip?)
  • pops from blob quarantine and add blobs to the block processor

else

  • does everything the conventional way

Note: getBlobs in nimbus-eth2 here is made to be backward compatible from Electra and NOT Deneb (Deneb is what the spec suggests)

Pros

  • doesn't call kzg proof verification and inclusion proof verification for 9 blobs per slot (assuming the EL can be trusted)
  • faster block import (in happy cases)

Cons

  • relying heavily on the EL mempool can cause threats of various sorts, however, our block processor runs another kzg proof check before persisting the block and blobs, in case of a failure, the RequestManager should be able to fetch back the otherwise malicious blob/proof data sent by the bad EL, EL can be quickly swapped, Nimbus would rely on blob gossip that entire duration and switch back to optimistically pulling blobs from EL.

CI failing: because of some nim-web3 upstream incompatibilities

Copy link

github-actions bot commented Feb 10, 2025

Unit Test Results

       12 files   -        3    2 089 suites   - 525   1h 5m 24s ⏱️ - 9m 22s
  6 412 tests ±       0    5 891 ✔️ ±       0  521 💤 ±  0  0 ±0 
35 681 runs   - 8 935  35 027 ✔️  - 8 871  654 💤  - 64  0 ±0 

Results for commit 086066a. ± Comparison against base commit 3ddcab5.

♻️ This comment has been updated with latest results.

@agnxsh
Copy link
Contributor Author

agnxsh commented Feb 11, 2025

for devnet testing locally, use ethpandaops/nimbus-eth2:getBlobs

when consensusFork >= ConsensusFork.Electra:
# Pull blobs and proofs from the EL blob pool
let blobsFromElOpt = await node.elManager.sendGetBlobs(forkyBlck)
debugEcho "pulled blobs from el"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obligatory comment about debugEcho (but yeah, it's a draft PR and useful for now)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for now, until i see things are ok in the local devnet

@agnxsh agnxsh force-pushed the getBlobs branch 3 times, most recently from 6ad1b3d to 03e11ee Compare February 14, 2025 06:00
@agnxsh
Copy link
Contributor Author

agnxsh commented Feb 14, 2025

image

2025-02-14 13:24:27 INF 2025-02-14 07:54:27.006+00:00 Slot end                                   topics="beacnde" slot=234 nextActionWait=1s993ms123us121ns nextAttestationSlot=235 nextProposalSlot=236 syncCommitteeDuties=current head=4932c763:234
2025-02-14 13:24:28 WRN 2025-02-14 07:54:28.713+00:00 Peer count low, no new peers discovered    topics="networking" discovered_nodes=0 new_peers=@[] current_peers=1 wanted_peers=160
2025-02-14 13:24:29 INF 2025-02-14 07:54:28.999+00:00 Slot start                                 topics="beacnde" nextFork=Fulu:100000001 head=4932c763:234 delay=137us922ns finalized=5:5177fdfd peers=1 slot=235 sync=synced epoch=7
2025-02-14 13:24:29 pulled blobs from el
2025-02-14 13:24:29 9
2025-02-14 13:24:31 INF 2025-02-14 07:54:31.295+00:00 Attestation sent                           topics="beacval" attestation="(committee_index: 0, attester_index: 28, data: (slot: 235, index: 0, beacon_block_root: \"61632bd0\", source: \"6:f956b1f3\", target: \"7:a67fac0a\"), signature: \"929f5919\")" delay=-1s703ms988us327ns subnet_id=11
2025-02-14 13:24:31 INF 2025-02-14 07:54:31.295+00:00 Attestation sent                           topics="beacval" attestation="(committee_index: 0, attester_index: 52, data: (slot: 235, index: 0, beacon_block_root: \"61632bd0\", source: \"6:f956b1f3\", target: \"7:a67fac0a\"), signature: \"b5d6aa5c\")" delay=-1s703ms537us202ns subnet_id=11
2025-02-14 13:24:31 INF 2025-02-14 07:54:31.296+00:00 Sync committee message sent 

we're finalising with nimbus running getBlobs in a devnet.

@agnxsh
Copy link
Contributor Author

agnxsh commented Feb 14, 2025

we can get some better decision making on whether to validate blob gossip or pull blobs from EL every slot, by caching engine_exchangeCapabilities, however a good interval should be decided as ELs can be swapped on the run.

@agnxsh agnxsh marked this pull request as ready for review February 16, 2025 21:03
@agnxsh agnxsh changed the title first draft of getBlobs implement getBlobs and bypass blob gossip validation on successful blob retrievals from EL Feb 16, 2025
@agnxsh
Copy link
Contributor Author

agnxsh commented Mar 3, 2025

https://discordapp.com/channels/595666850260713488/1252403418941624532/1341996653149945927

an interesting mainnet research from Jimmy (Lighthouse) suggests, that there are 0-8 blob misses per minute by the EL, hence we shall only bypass gossip validation for blobs completely when the EL can serve a complete response (i.e, blobs from EL == kzg comms len).

an alternative can be done, where for every block we can fetch most blobs from the EL, and the rest from gossip val, but in that case, we'd likely spend some time checking what we missed from EL, which isn't very ideal.

image
grafana for ref

let block_root = hash_tree_root(block_header)

let vEl =
await self.validateBlobSidecarFromEL(block_root)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems like in the case of a pathological EL, this could delay things by 1 second?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants